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Abstract. In this paper, we consider the challenge of maximizing an 
unknown function / for which evaluations are noisy and are acquired 
fT^ with high cost. An iterative procedure uses the previous measures to ac- 

T-H tively select the next estimation of / which is predicted to be the most 

^^ useful. We focus on the case where the function can be evaluated in par- 

Cn allel with batches of fixed size and analyze the benefit compared to the 

?H purely sequential procedure in terms of cumulative regret. We introduce 

t^ the Gaussian Process Upper Confidence Bound and Pure Exploration 

algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Ex- 
ploration in the same batch of evaluations along the parallel iterations. 
We prove theoretical upper bounds on the regret with batches of size K 
for this procedure which show the improvement of the order of \/K for 
I I fixed iteration cost over purely sequential versions. Moreover, the mul- 

rn tiplicative constants involved have the property of being dimension-free. 

1 We also confirm empirically the efficiency of GP-UCB-PE on real and 

• synthetic problems compared to state-of-the-art competitors. 

1 Introduction 

>• 
^~v Finding the maximum of a non-convex function by means of sequential noisy 

ly-j observations is a common task in numerous real world applications. The con- 

C^ text of a high dimensional input space with expensive evaluation cost offers new 

'^ challenges in order to come up with efficient and valid procedures. This prob- 

^^ lem of sequential global optimization arises for example in industrial system 

^^ design and monitoring to choose the location of a sensor to find out the maxi- 

^^ mum response, or when determining the parameters of a heavy numerical code 

. . designed to maximize the output. The standard objective in this setting is to 

J> minimize the cumulative regret Rt, defined as the sum Xit=i (/(^*) ^ fi^t)) of 

the differences between the values of / at the points queried Xt and the true 
optimum of / noted x* . For a fixed horizon T, we refer to |20j . In the context 
C^ where the horizon T is unknown, the query selection has to deal with the explo- 

ration/exploitation tradeoff. Successful algorithms have been developed in differ- 
ent settings to address this problem such as experimental design d5 , Bayesian 
optimization P [H [HI [311 [Ml HTj , active learning HO], multiarmed bandit 
[21 [31 [TTl [25J [211 [11 [33] and in particular Hierarchical Optimistic Optimization 
algorithm, HOG [B] for bandits in a generic space, namely A'-Armed bandits. 
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In some cases, it is possible to evaluate the function in parallel with batches 
of K queries with no increase in cost. This is typically the case in the sensors 
location problem if K sensors are available at each iteration, or in the numerical 
optimization problem on a cluster of K cores. Parallel strategies have been de- 
veloped recently in [T31[3]. In the present paper, we propose to explore further 
the potential of parallel strategies for noisy function optimization with unknown 
horizon aiming simultaneously at practical efhciency and plausible theoretical 
results. We introduce a novel algorithm called GP-UCB-PE based on the Gaus- 
sian process approach which combines the benefits of the UCB policy with Pure 
Exploration queries in the same batch of K evaluations of /. The Pure Explo- 
ration component helps to reduce the uncertainty about / in order to support 
the UCB policy in finding the location of the maximum, and therefore in in- 
creasing the decay of the regret Rt at every timestep t. In comparison to other 
algorithms based on Gaussian processes and UCB such as GP-BUCB [13], the 
new algorithm discards the need for the initialization phase and offers a tighter 
control on the uncertainty parameter which monitors overconfidence. As a re- 
sult, the derived regret bounds do not suffer from the curse of dimensionality 
since the multiplicative constants obtained are dimension free in contrast with 
the doubly exponential dependence observed in previous work. We also mention 
that Monte-Carlo simulations can be proposed as an alternative and this idea 
has been implemented in the Simulation Matching with UCB policy (SM-UCB) 
algorithm W\ which we also consider for comparison in the present paper. Un- 
like GP-BUCB, no theoretical guarantees for the SM-UCB algorithm are known 
for the bounds on the number of iterations needed to get close enough to the 
maximum, therefore the discussion will be reduced to empirical comparisons over 
several benchmark problems. The remainder of the paper is organized as follows. 
We state the background and our notations in Section [2] We formalize the Gaus- 
sian Process assumptions on /, and give the definition of regret in the parallel 
setting. We then describe the GP-UCB-PE algorithm and the main concepts in 
Section [3] We provide theoretical guarantees through upper bounds for the cu- 
mulative regret of GP-UCB-PE in Section |4j We finally show comparisons of our 
method and the related algorithms through a series of numerical experiments on 
real and synthetic functions in Section [5] 

2 Problem Statement and Background 

2.1 Sequential Batch Optimization 

We address the problem of finding in the lowest possible number of iterations 
the maximum of an unknown function / : A" — > M where X a M'', denoted by 

fix*) =maxf{x) . 

The arbitrary choice of formulating the optimization problem as a maximization 
is without loss of generality, as we can obviously take the opposite of / if the 
problem is a minimization one. At each iteration t, we choose a batch of K points 
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in X called the queries {x^'}o^fc<i<', and then observe simultaneously the noisy 
values taken by / at these points, 

where the e^ are independent Gaussian noise A/'(0, a'^). 
2.2 Objective 



Assuming that the horizon T is unknown, a strategy has to be good at any 

ier 
point queried a:^, 



ik) 

iteration. We denote by rj the difference between the optimum of / and the 



We aim to minimize the cumulative regret, 



^T = Yj 



' t ; 
t<T 



which is the standard objective with these formulations of the problem [i5j. We 
focus on the case where the cost for a batch of evaluations of / is fixed. The loss 
r^ incurred at iteration t is then the simple regret for the batch [7], defined as 

K ■ (k) 

Tj = mm Tj 

k<K 

An upper bound on R^ gives an upper bound of -t^- on the minimum gap 
between the best point found so far and the true maximum. We also provide 
bounds on the usual cumulative regret. 



^TK = E E ri'' ^ 



t<Tk<K 

which model the case where all the queries in a batch should have a low regret. 

2.3 Gaussian Processes 

In order to analyze the efficiency of a strategy, we have to make some assumptions 
on /. We want extreme variations of the function to have low probability. 

Modeling / as a sample of a Gaussian Process (GP) is a natural way to 
formalize the intuition that nearby location are highly correlated. It can be seen 
as a continuous extension of multidimensional Gaussian distributions. We said 
that a random process / is Gaussian with mean function m and non-negative 
definite covariance function (kernel) k, written 

/ ^ GP{m, k) , 
where m : X ^ R 

and /c : A" X A" ^ M+ , 
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Fig. 1. Gaussian Process inference of tlie posterior mean p. (blue line) and deviation a 
based on four realizations (blue crosses). The high confidence region (area in grey) is 
delimited by /+ and /^. 



when for any finite subset of locations the values of the random function form a 
multivariate Gaussian random variable of mean vector /x and covariance matrix 
C given by the mean m and the kernel k of the GP. That is, for all finite n and 

Xi , . . . , Xji G H. , 

(/(xi),...,/(x„))-AA(m,C), 
with ^i[xi\ = m{xi) 
and i^iXj, Xj\ -— f^iXi^ '^j) ■ 

If we have the prior knowledge that / is drawn from a GP with zero mean F] and 
known covariance (kernel) function, we can use Bayesian inference conditioned 
on the observations after T iterations to get the closed formulae for computing 
the posterior [30], which is a GP of mean and variance given at each location 
X e X by, 

nAx) = kTix)^C-'YT (1) 

and a^{x) = k{x,x) — kT(a;)'''Cy^kT(x) , (2) 

X.T = {Xf}t<T,k<K is the set of queried locations, Yt = [yt]x''eyiT ^^ *^® vector 
of noisy observations, hrix) = [k{x^,x)]^k^y^^ is the vector of covariances be- 
tween X and the queried points, and Ct = Kt + c^I with Kt = [k{x, x')]x,x'€'x.t 
the kernel matrix and I stands for the identity matrix. 
The three most common kernel functions are: 

— the polynomial kernels of degree a e N, fc(xi, X2) = (2:7a;2 + c)" , c e M 

— the (Gaussian) Radial Basis Function kernel (RBF or Squared Exponential) 

with length-scale I > 0, k{xi,X2) = exp I — "'^'-^p^" j, 

— the Matern kernel, of length-scale I and parameter i/, 



k{xi,X2) 



■yl^u 



^\\xi,X2\\ 



K,. 



^\\xi,X2\ 



r{v) V I J ""v I 

where K^ is the modified Bessel function of the second kind and order v 



(3) 



^ this is without loss of generality as the kernel k can completely define the GP |30| . 
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The Bayesian inference is represented on Figure [l] in a sample problem in di- 
mension 1. The posteriors are based on four observations of a Gaussian Process. 
The vertical height of the grey area is proportional to the posterior deviation at 
each point. 

3 Parallel Optimization Procedure 

3.1 Confidence Region 

A key property from the GP framework is that the posterior distribution at 
a location x has a normal distribution Af{flT{x),a^{x)). We can then define a 
upper confidence bound /+ and a lower confidence bound /~, such that / is 
included in the interval with high probability, 



f+(x) = flT{x) + V/3T?T-i(a;) (4) 

and /^(x) = firix) - y%^d-T-i{x) , (5) 

with Pt e OilogT) defined in Section [i) 

/+ and /^ are illustrated on Figure Ifl respectively by the upper and lower 
envelope of the grey area. The region delimited in that way, the high confidence 
region, contains the unknown / with high probability. This statement will be a 
main element in the theoretical analysis of the algorithm in Section [4] 

3.2 Relevant Region 

We define the relevant region 9\t being the region which contains x* with high 
probability. Let y' be our lower confidence bound on the maximum, 

y't = /r(2;*), where x* = argmax/t"(x) . 

xeX 

y' is represented by the horizontal dotted green line on Figure [21 $Kt is defined 
as, 

mt = lx€X\f+ix)^yf 



f)\t discard the locations where x* does not belong with high probability, ft is 
represented in green on Figure l2J We refer to [T7j for related work in the special 
case of deterministic Gaussian Process Bandits. 

In the sequel, we will use a modified version of the relevant region which also 
contains argmax^g_;j^ f^+i{x) with high probability. The novel relevant region is 
formally defined by 

mt = {xeX\ Jlt{x) + 2VA;^5t-i(x) ^yl] . (6) 

Using "tK^ instead of 9^t guarantees that the queries at iteration t will leave an 
impact on the future choices at iteration i + I. 
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Algorithm 1: GP-UCB-PE 



for t = 0, . . . , T do 

Compute fit and at with Eqll and Eql2 
x? ^ argmax^g,^. f+ (x) 
Compute JHt+ with Eqje] 
for A; = 1,..., A'- 1 do 

Compute al ' with Eq 2 
Xt ^ argmax^^j^+ al '(x) 

Query {xt}k<K 



3.3 GP-UCB-PE 

We present here the Gaussian Process Upper Confidence Bound with Pure Ex- 
ploration algorithm, GP-UCB-PE, a novel algorithm combining two strategies to 
determine the queries {Xf}k<K for batches of size K. The first location is chosen 
according to the GP-UCB rule. 



argmax/+(a;) . (7) 



X€X 



This single rule is enough to tackle the exploration/exploitation tradeoff. The 
value of /3t balances between exploring uncertain regions (high posterior variance 
af{x)) and focusing on the supposed location of the maximum (high posterior 
mean jlt{x)). This policy is illustrated with the point x^ on Figure pi 

The K — 1 remaining locations are selected via Pure Exploration restricted 
to the region D\f . We aim to maximize /t(X( ~^), the information gain about / 
by the locations Xj ~^ = {x'l}i<ck<K [I2- Formally, /t(X) is the gain in entropy 
when knowing the values of the observations Y at X, conditioned on X^ the 
observations we have seen so far. 

It{X) = H(Y)-H(Y\Xt). (8) 

Finding the K —1 points that maximize It for any integer K is known to be N P- 
complete [23]. However, due to the submodularity of /( [T9j, it can be efficiently 
approximated by the greedy procedure which selects the points one by one and 
never backtracks. For a Gaussian distribution, H{M{n,C)) = ^ logdet(27reC). 
We thus have /t(X) G ©(logdet S), where S is the covariance matrix of X. For 
GP, the location of the single point that maximizes the information gain is easily 
computed by maximizing the posterior variance. For all 1 ^ A: < A' our greedy 
strategy selects the following points one by one, 

x'l = argmaxCTj ' {x) , (9) 

where a\ is the updated variance after choosing {x^ }k'<k- We use here the fact 
that the posterior variance does not depend on the values yt of the observations. 
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Fig. 2. Two queries of GP-UCB-PE on the previous example. The lower confidence 
bound on the maximum is represented by the horizontal dotted green line at y*. The 
relevant region *H is shown in light green (without edges). The first query x^ is the 
maximizer of /^ . We show in dashed line the upper and lower bounds with the update 
of a after having selected x". The second query x^ is the one maximizing the uncertainty 
inside VK'^ , an extension of iH which is not illustrated here. 



but only on their position x^. One such point is illustrated with x^ on Figure pi 
These K — 1 locations reduce the uncertainty about /, improving the guesses of 
the UCB procedure by x^. The overall procedure is shown in Algorithm 111 

3.4 Numerical Complexity 

Even if the numerical cost of GP-UCB-PE is insignificant in practice compared to 
the cost of the evaluation of /, the complexity of the exact computations of the 
variances (EqJ2]) is in 0{'n?) and might by prohibitive for large n = tK. One can 
reduce drastically the computation time by means of Lazy Variance Calculation 
[13], built on the fact that at{x) always decreases when t increases for all a; e A". 
We further mention that efficient approximated inference algorithms such as the 
EP approximation and MCMC sampling ^26j can be used in order to face the 
challenge of large n. 



4 Regret Bounds 

4.1 Main Result 

The main theoretical result of this article is the upper bound on the regret 
formulated in Theorem IT] We need to adjust the parameter Pt such that /(x) 
is contained by the high confidence region for all iterations t with probability at 
least 1 — i5 for a fixed < 5 < 1. 
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— If A" is finite, then we choose /3t = Slogd^Yj ^) where ttj > such that 

H^o T^r^ = 1- We set for example /3t = 21og ( \X\ t^^). 

— li X cz [0,r]'^ is compact and convex, we need the following bounds on the 
derivatives of /, 



3a, b>0, Vj < d, Pr ( sup 
\xeX 



df 



> L] ^ ae ^ 



Then, we can set 



A = 21og (^i^^) + 2dlog Udbr^log (^) j . 

The regret bound are expressed in term of ^tk, the maximum information 
gain (Eq. Isl) obtainable by a sequence of TK queries. 

74 = max /o(X) . 

XcA',|X| = t 

Under these assumptions, we obtain the following result. 

Theorem 1. Fix < S < 1 and consider the calibration of Pt defined as above, 
assuming f ~ GP(0,k) with bounded variance, Vx e X, k(x,x) < 1, then the 



cumulative regret incurred by GP-UCB-PE on f is bounded by 0{\j^j3t^tk 
whp. More precisely, with Ci = wTT+F^' '^"'^ ^2 = ^; VT, 



Pv{r^ ^ ^JCi^PtItk + C2 j > 1 - 5 



For Rtk we obtain similar bounds with C^ = -, — rrr— ^2y, 

^ ^ ^ log(l + cr ^) ' 



Pr (^Rtk < \/CiT(3tJtk + C2) > 1 



5 



4.2 Discussion 

When K « T, the probabilistic bound is better than the one of sequential 
GP-UCB by an order of \/K. Compared to [13], we remove the need of the 
initialization phase. GP-UCB-PE does not need either to multiply the uncertainty 
parameter /St by exp(7™]J) where 7™]J is equal to the maximum information 
gain obtainable by a sequence of TK queries after the initialization phase. The 
improvement can be doubly exponential in the dimension d in the case of RBF 
Kernels. To the best of our knowledge, no regret bounds have been proven for 
the Simulation Matching algorithm. 

The values of 'Jtk for different common kernel are reported in Table [l] 
where d is the dimension of the space considered and a = 2!^+d(d+i) ^ 1; '^ 
being the Matern parameter. We also compare on Table [T] the general forms of 
the bounds for the regret obtained by GP-UCB-PE and GP-BUCB up to con- 
stant terms. The cumulative regret we obtained with RBF Kernel is of the form 

dU^{logTK)A against O (exp{{^)'^) ^J^^{]ogTKY) for GP-BUCB. 
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Table 1. General Forms of Regret Bounds for GP-UCB-PE and GP-BUCB 

GP-UCB-PE GP-BUCB 



7TK 

C 









Jxx 


/ T log T „ 


r^ / T log TK 


Kernel 


Linear RBF Matern 



dlogTK {\ogTK)''+^ {TK)°'\ogTK 
exp(f) exp((f)'*) 



4.3 Proofs of the main result 

In this section, we analyze theoretically the regret bounds for the GP-UCB-PE 
algorithm. We provide here the main steps for the proof of Theorem IT] On one 
side the UCB rule of the algorithm provides a regret bounded by the information 
we have on / conditioned on the values observed so far. On the other side, the 
Pure Exploration part gathers information and therefore accelerates the decrease 
in uncertainty. We refer to [T3] for the proofs of the bounds for GP-BUCB. 

For the sake of concision, we introduce the notations cr^ for CTj (x^) and af 
for at_i(x°). We simply bound r^ the regret for the batch at iteration t by the 
simple regret rl for the single query chosen via the UCB rule. We then give 
a bound for r). which is proportional to the posterior deviations <7°. Knowing 
that the sum of all (cr^)^ is not greater than Ci'Jtk, we want to prove that the 
sum of the (a^)'^ is less than this bound divided by K. The arguments are based 
on the fact that the posterior for f{x) is Gaussian, allowing us to choose /3t such 
that 

\fxeX,\ft<T, f{x)e[f,-ix)J+{x)] 

holds with high probability. Here and in the following, "with high probability" 
or whp means "with probability at least 1 — 5" for any < S < 1, the definition 
of /?( being dependent of 6. 



Lemma 1. For finite X , we have r^ ^ r^ < 2-\//5tcr^, and for compact and 
convex X following the assumptions 
holds with probability at least \ — 5. 



convex X following the assumptions of Theorem\l\ r^ ^ r^ ^ 2-\//5tcr^ + p-, 



(0) 



We refer to [31] (Lemmas 5.2, 5.8) for the detailed proof of the bound for r. 

Now we show an intermediate result bounding the deviations at the points 
a^t+i by the one at the points xf~^. 



Lemma 2. The deviation of the point selected by the UCB policy is bounded by 
the one for the last point selected by the PE policy at the previous iteration, whp, 

.0 

t+i 



Vt < T, a°, sg a^ 
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Proof. By the definitions of Xf_f,i (Eqilh, we fiave f^_^-j^{x^j^i) ^ /j^]^(x*). Tlien, 
we know witli liigh probability that Va; e X,\ft < T, f^_^^{x) ^ f^{x). We can 
therefore claim whp /j^]^(x^_^j) 5= y*, and thus that x^^^ e dX'^ whp. 

We have as a result by the definition of x^^^ (Eq 9| that a]. ' (x^^^) < 



al {Xf ~ ) whp. Using the "Information never hurts" principle [25^ , we know 
that the entropy of f{x) for all location a; decreases while we observe points Xt. 
For GP, the entropy is also a non-decreasing function of the variance, so that 

We thus prove cr°_^j ^ f^t~^- 

Lemma 3. The sum of the deviations of the points selected by the UCB policy 
are bounded by the one for all the selected points divided by K , whp, 

T-l , T-lif-l 

t=0 t=0 fc=0 

Proof. Using Lemma and the definitions of the x'l, we have that ct^j^^ < a^ for 

all k ^ 1. Summing over k, we get for all t ^ 0, cr^ + {K — ^)(Jt+i ^ Z;fc=o '^t- 
Now, summing over t and with ctq 5= and aj^ 5= 0, we obtain the desired result. 

Next, we can bound the sum of all posterior variances {(J^)'^ via the maximum 
information gain for a sequence of TK locations. 

Lemma 4. The sum of the variances of the selected points are bounded by a 
constant factor times Jtk, ^C'l 6 H^j Yjt<T^k<K('^t)'^ ^ C'iTtx where ^tk 
is the maximum information gain obtainable by a sequential procedure of length 
TK. 



Proof. We know that the information gain for a sequence of T locations Xt can 
be expressed in terms of the posterior variances (at-iixt))'^ . The deviations 
a^ being independent of the observations y^', the same equality holds for the 
posterior variances {al (x^))^. See Lemmas 5.3 and 5.4 in [ST] for the detailed 
proof, giving C[ = i„g(4^-^) - 

Lemma 5. The cumulative regret can be bound in terms of the maximum, in- 
formation gain, whp, 3Ci,C2 e M, 
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Proof. Using the previous lemmas and the fact that /3t ^ (3t for ah t ^ T, we 
have in the case of finite X, whp 



2 rf s; ;^ 2V^^t" , by Lemma [T] 

t<T 

< 2V^^ E E ^' ' by Lemma [3] 



t<T t<T 



K 

t<T k<K 



2V^— TK Yi Yi ('^*')^ ' ^y Cauchy-Schwarz 



^ - 

Y t<Tk<K 

^ 2^/(iT^^JTKC[^TK , by Lemma g] 

/r 4 

"== \ I^^iPtItk with Ci = :^ — — ^ 

V K log(l + cr^-^) 

For compact and convex X , a similar reasoning gives 



Rt ^\j JtCiPtItk + C2 with C2 = — = < 2 

Lemma p^ conclude the proof of Theorem n] for the regret R^ . The analysis 
for Rtk is simpler, using the Lemma [6] which bounds the regret for the Pure 
Exploration queries, leading to Ci = j^ ii?^-2) ■ 

Lemma 6. The regret for the queries x^ selected by Pure Exploration in 5H^ 
are bounded whp by, Gs/JStcr^ . 

Proof. As in Lemma [Tl we have whp, for all t ^ T and fc > 1, 

rf ) =g Mx*) + ^/platix*) - /i(4^) + VA^' 

^ f{x') + 2V^(Tf (x*) - jl{xt) + \/%a-^ by definition of x' 

^ ll{x'^) + 2v7?t7icrf + 2-s/J3ta^ - fl{x'^) + ^to\ by definition of $R+ 

< 3V^crf + 2-\JJt<Jt + -s/JtC^t by definition of (3t+i 

To conclude the analysis of Rtk and prove Theorem [T] it suffices to use then 
the last three steps of Lemma [5] 

5 Experiments 

5.1 Protocol 

We compare the empirical performances of our algorithm against the state of 
the art of global optimization by batches, GP-BUCB ^ and SM-UCB [3]. The 
tasks used for assessment come from three real applications and two synthetic 
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(a) Himmelblau (b) Gaussian mixture 

Fig. 3. Visualization ol the synthetic lunctions used for assessment 



problems described here. The results are shown in Figure |4] For all datasets 
and algorithms, the size of the batches K was set to 10 and the learners were 
initialized with a random subset of 20 observations (xi, yi). The curves on Figure 
Wl show the evolution of the regret i?f^ in term of iteration t. We report the 
average value with the confidence interval over 64 experiments. The parameters 
of the algorithm, like the bandwidth of the RBF Kernel were chosen as the best 
parameters found by maximization of the posterior likelihood. 

5.2 Description of data sets 

Generated GP. The Generated GP functions are random GPs drawn from a 
Matern kernel (Eq. pk in dimension 2, with the kernel bandwidth set to j, the 
Matern parameter v = 3 and noise variance a^ set to 1. 

Gaussian Mixture. This synthetic function comes from the addition of three 2-D 
Gaussian functions, at (0.2,0.5), (0.9,0.9), and the maximum at (0.6,0.1). We 
then perturb these Gaussian functions with smooth variations generated from a 
Gaussian Process with Matern Kernel and very few noise. It is shown on Figure 
|3(b)[ The highest peak being thin, the sequential search for the maximum of this 
function is quite challenging. 

Himmelblau function. The Himmelblau task is another synthetic function in 
dimension 2. We compute a slightly tilted version of the Himmelblau's function, 
and take the opposite to match the challenge of finding its maximum. This 
function presents four peaks but only one global maximum. It gives a practical 
way to test the ability of a strategy to manage exploration/exploitation tradeoffs. 



It is represented in Figure 3(a) 



Mackey-Glass function. The Mackey-Glass delay-differential equation [j is a 
chaotic system in dimension 6, but without noise. It models real feedback systems 



http : //www. scholarpedia. org/article/Mackey-Glass_equation 
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2 4 6 8 in 12 14 16 18 20 
Iteration t 

(a) Generated GP 



1 6 8 10 12 14 16 18 20 
Iteration ( 

(b) Himmelblau 




5 10 15 20 25 30 35 
Iteration t 

(c) Gaussian Mixture 




5 in 15 2n 25 30 
Iteration t 

(d) Mackey- Glass 




2 4 6 

Iteration t 



UCB-PE 
5 10 15 20 25 30 
Iteratiotr t 



(e) Tsunamis 



(f) Abalone 



Fig. 4. Experiments on several real and synthetics tasks. The curves show the decay of 
the mean of the simple regret rf^ with respect to the iteration t, over 64 experiments. 
We show with the translucent area the confidence intervals. 



and is used in physiological domains such as hematology, cardiology, neurology, 
and psychiatry. The highly chaotic behavior of this function makes it an ex- 
ceptionally difficult optimization problem. It has been used as a benchmark for 
example by |16) . 

Tsunamis. Recent post-tsunami survey data as well as the numerical simula- 
tions of [21] have shown that in some cases the run-up, which is the maximum 
vertical extent of wave climbing on a beach, in areas which were supposed to be 
protected by small islands in the vicinity of coast, was significantly higher than 
in neighboring locations. Motivated by these observations 32^ investigated this 
phenomenon by employing numerical simulations using the VOLNA code [14] 
with the simplified geometry of a conical island sitting on a flat surface in front 
of a sloping beach. Their setup was controlled by five physical parameters and 
their aim was to find with confidence and with the least number of simulations 
the maximum run-up amplification on the beach directly behind the island, com- 
pared with the run-up on a lateral location, not infiucnccd by the presence of 
the island. Since this problem is too complex to treat analytically, the authors 
had to solve numerically the Nonlinear Shallow Water Equations. 

Abalone. The challenge of the Abalone dataset is to predict the age of a specie of 
sea snails from physical measurements. It comes from the study by [29 and it is 
provided by the UCI Machine Learning Repository]^ We use it as a maximization 
problem in dimension 8. 



http : //eir chive . ics . uci . edu/ml/datasets/Abalone 
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5.3 Comparison of algorithms 



The algorithm SM — Simulation Matching — described in [Ij, with UCB base 
policy, has shown similar results to GP-UCB-PE on synthetic functions (Figures 
4(a) |4(b)[[4(c) ) and even better results on chaotic problem without noise (Figure 
4(d) I, but performs worse on real noisy data (Figures 4(e)[|4(f)"|). On the contrary, 



the initialization phase of GP-BUCB leads to good regret on difhcult real tasks 
(Figure 4(e)[), but looses time on synthetic Gaussian or polynomial ones (Figures 



4(a)[ 4(b)[ 4(c) ). The number of dimensions of the Abalone task is already a 



limitation for GP-BUCB with the RBF kernel, making the initialization phase 
time-consuming. The mean regret for GP-BUCB converges to zero abruptly after 
the initialization phase at iteration 55, and is therefore not visible on Figure 
|4(f)[ as for 4(c) where its regret decays at iteration 34. 



GP-UCB-PE achieves good performances on both sides. We obtained better 
regret on synthetic data as well as on real problems from the domains of physics 
and biology. Moreover, the computation time of SM was two order of magnitude 
longer than the others. 

6 Conclusion 

We have presented the GP-UCB-PE algorithm which addresses the problem of 
finding in few iterations the maximum of an unknown arbitrary function observed 
via batches of K noisy evaluations. We have provided theoretical bounds for the 
cumulative regret obtained by GP-UCB-PE in the Gaussian Process settings. 
Through parallelization, these bounds improve the ones for the state-of-the-art 
of sequential GP optimization by a ratio of s/K , and are strictly better than 
the ones for GP-BUCB, a concurrent algorithm for parallel GP optimization. We 
have compared experimentally our method to GP-BUCB and SM-UCB, another 
approach for parallel GP optimization lacking of theoretical guarantees. These 
empirical results have confirmed the effectiveness of GP-UCB-PE on several ap- 
pHcations. 

The strategy of combining in the same batch some queries selected via Pure 
Exploration is an intuitive idea that can be applied in many other methods. 
We expect for example to obtain similar results with the Maximum Expected 
Improvement policy (MEI). Any proof of regret bound that relies on the fact 
that the uncertainty decreases with the exploration should be easily adapted to 
a paralleled extension with Pure Exploration. 

On the other hand, we have observed in practice that the strategies which 
focus more on exploitation often lead to faster decrease of the regret, for example 
the strategy that uses K times the GP-UCB criterion with updated variance. 
We conjecture that the regret for this strategy is unbounded for general GPs, 
justifying the need for the initialization phase of GP-BUCB. However, it would 
be relevant to specify formally the assumptions needed by this greedy strategy 
to guarantee good performances. 
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